[misc] refactor: refactor the folders structure#3
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the repository to rename the core package from verl to verl_omni, focusing on RL training for diffusion and omni-modality models. The changes include comprehensive updates to imports, configuration paths, and project metadata, as well as the removal of legacy PPO configurations. Feedback indicates that version requirements for tensordict, vllm, and verl appear incorrect or refer to non-existent public versions, which could lead to installation failures. Additionally, a suggestion was made to update the import path for TrainingWorkerConfig in the test suite to maintain consistency with the new package structure.
| parent.diffusers_impl = _omni_impl | ||
|
|
||
|
|
||
| def apply_patches() -> None: |
There was a problem hiding this comment.
Maybe we can add log info here, to remind dropping these patches in future version
| cachetools | ||
| codetiming | ||
| datasets | ||
| diffusers |
| __version__ = f.read().strip() | ||
|
|
||
| install_requires = [ | ||
| "verl>=0.7.1", |
There was a problem hiding this comment.
did you try to install in this way?
There was a problem hiding this comment.
No, we hold a temporally version here. I think it should be formally supported with 0.8.0?
…s — client wiring, Hydra rebuild, multi_modal_inputs, eval + docs Round 3 of the Qwen3-TTS GRPO recipe. Addresses all four Codex round-2 findings and lands T15 / T16 / T17 / T18 stubs. 111 recipe-scope tests pass. Codex round-2 blocker fixes: verl-project#1 — Real client wiring. New verl_omni/workers/rollout/_tts_client_patch.py monkey-patches LLMServerManager.get_client to return AutoRegressiveTTSServerClient when rollout.name == "vllm_omni_tts" (falls through to upstream LLMServerClient otherwise). Patch is idempotent and triggered at import time from verl_omni/workers/rollout/replica.py. New tests in tests/workers/test_tts_client_patch.py prove the upstream LLMServerManager now returns a client with generate_tts() for the AR-TTS rollout name. verl-project#2 — Hydra config tree rebuilt to compose correctly. The previous subdirs (actor/rollout/ref/reward) used "# @Package _global_" which double-nested every section, so the trainer saw keys like "rollout.actor_rollout_ref.rollout" instead of "actor_rollout_ref.rollout". Deleted the broken subdirs and replaced the recipe with a single top-level verl_omni/trainer/config/qwen3_tts_trainer.yaml that: - Inherits the diffusion recipe's full defaults tree (diffusion/{actor,ref,rollout,model,model_engine} + data + reward) so all upstream-required sections (ray_kwargs, etc.) are present. - Overrides actor_rollout_ref.rollout for vllm_omni_tts + AR-TTS agent loop / manager. - Overrides algorithm with upstream verl AlgoConfig (adv_estimator=grpo). - Overrides reward.reward_manager to AsrErrorRateRewardManager. - Wires data.custom_cls = Qwen3TTSDataset. verl_omni/trainer/qwen3_tts_grpo/main.py adjusted to load from ../config with config_name=qwen3_tts_trainer. New tests in tests/trainer/test_qwen3_tts_hydra_compose.py compose the shipped yaml via Hydra, apply launch overrides, and verify: - Top-level keys land at expected paths (no double-nesting). - AR-TTS overrides survive composition. - The recipe validator accepts the composed config. - The validator rejects missing ASR URL, flow_grpo, co_located ASR, and n<2 at the real composition layer (not synthetic dicts). verl-project#3 — multi_modal_inputs in worker output. verl_omni/agent_loop/autoregressive_tts_agent_loop.py:_postprocess() now emits a multi_modal_inputs object-array column (empty dict per row, audio recipes have no image/video inputs) plus __num_turns__, so upstream RayPPOTrainer.training_step's unconditional batch.non_tensor_batch["multi_modal_inputs"] iteration succeeds. verl-project#4 — Recipe artifacts finished, no further deferrals: - examples/qwen3_tts_grpo_trainer/eval.sh + run_full.sh + verl_omni/trainer/qwen3_tts_grpo/run_eval.py. - run_eval.py writes eval_results.json with the AC-7 schema (base/RL CER + median/mean duration_ratio), enforces target_utt_id disjointness, and supports mock-inference JSON for offline scoring. - verl_omni/utils/validation_audio_logger.py implements AC-8: writes >=4 generated + ref + (optional target) wavs per validation step plus metrics.json; ArtifactWriteError on disk failures; post-run helper flags zero-artifact steps. - docs/recipe/qwen3_tts_grpo.md documents pipeline, layout, smoke, eval, full run, hardware, validation logging. - examples/qwen3_tts_grpo_trainer/README.md updated — open-items section replaced with eval / full-run / audio-logging sections. Test totals (recipe scope): 111 passed. - dataset 9, audio output 4, reward 19, pairing 4, validator 9, hydra compose 7, run_eval 2, client patch 4, audio logger 5, plus pre-existing recipe-adjacent tests. Still pending: T1 / T2b / T3 / T14 — live-engine smoke against the downloaded Qwen3-TTS-12Hz-0.6B-Base + a running Qwen3-ASR server. Round 4's focus. AI assistance: Claude Code (Opus 4.7, 1M context). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
…t, validation logger wired, real eval path, T1 partial Round 4 of the Qwen3-TTS GRPO recipe. Addresses every Codex round-3 blocker and runs T1's static checkpoint verification. Round-3 blocker fixes: verl-project#1 — Real eval path (B1). run_eval.py no longer NotImplementedErrors on the default launcher path. _live_inference_split() spins up AsyncOmni for each checkpoint, drives one Qwen3-TTS rollout per eval row, captures the generated waveform + duration, and feeds it into _score_inference() which transcribes via the remote ASR endpoint and computes per-row CER + duration_ratio. --mock-inference-json retained as an opt-in TEST-ONLY path that bypasses engine + ASR (used by tests/trainer/test_qwen3_tts_run_eval.py). verl-project#2 — Validation logger wired into runtime (M2). AutoRegressiveTTSAgentLoopWorker.__init__ resolves the validation output dir from config.trainer.validation_data_dir (falling back to default_local_dir/validation_audio) and stores log_validation_step + post_run_check_emitted_artifacts. New _emit_validation_artifacts(...) is called from generate_sequences when batch.meta_info["validate"] is True, persisting AC-8 artifacts (>=4 generated wavs + ref + optional target + metrics.json) per validation step. New tests/agent_loop/test_ar_tts_validation_logging.py exercises the worker call path end-to-end (no GPU). verl-project#3 — Canonical Hydra layout restored (M3). Moved trainer yaml back to verl_omni/trainer/config/qwen3_tts/qwen3_tts_trainer.yaml (per AC-9 and Codex round-3 finding). Two fixes needed to keep composition correct in the subdir: - `# @Package _global_` on the trainer yaml itself, so its content goes to the root namespace (not under qwen3_tts.*). - All defaults use absolute paths (`/diffusion/...`, `/data@data`, `/reward@reward`) so groups resolve against the config search root, not against the qwen3_tts/ subdir. - Hardcoded `dp_diffusion_actor` / `dp_diffusion_ref` instead of the `${diffusion/model_engine}` interpolation, which breaks in the subdir (Hydra namespaces the group choices under qwen3_tts.*). main.py adjusted to config_name="qwen3_tts/qwen3_tts_trainer". Hydra composition tests pass against the canonical layout (7/7). verl-project#4 — T1 partial (analyze gate). Round-4 verifies the static contract: `safetensors.safe_open('.hf_cache/Qwen3-TTS-12Hz-0.6B-Base/model.safetensors')` reports 478 total keys, 76 of them under `speaker_encoder.*`. The model supports voice cloning per the plan. The standalone scripts/qwen3_tts_smoke.py runs the full T1 + T2b verification (live engine launch); it is shipped so round 5 (or a human) can execute the live half against a real ref_audio. T1's fail-closed-if-absent contract is preserved by the script. Files changed: 6 modified, 3 new (scripts/qwen3_tts_smoke.py, tests/agent_loop/test_ar_tts_validation_logging.py, restored qwen3_tts/qwen3_tts_trainer.yaml). 113 recipe-scope tests pass (112 R3 + 2 new R4 worker validation logger tests, less 1 because test_run_eval.py is now stricter about the mock path). Still pending: T1 part 2 (live AsyncOmni inference), T2b (stage-0 logprobs in OmniRequestOutput), T3 (remote ASR HTTP smoke), T14 (end-to-end run_smoke.sh). All four require launching the actual servers, which round 5 will attempt. AI assistance: Claude Code (Opus 4.7, 1M context). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
…dirs, AC-6 metrics in metrics.json, fail-closed speaker check, post-run AC-8 check Round 5 of the Qwen3-TTS GRPO recipe. Addresses every Codex round-4 action item except T1/T2b/T3/T14 live-engine runs. verl-project#1 — Canonical packaging finished (T11). The qwen3_tts/ subdir now contains the per-role files the plan and AC-9 require: verl_omni/trainer/config/qwen3_tts/ ├── qwen3_tts_trainer.yaml (top-level, # @Package _global_) ├── actor/qwen3_tts_actor.yaml (PPO mini/micro batch overrides) ├── ref/qwen3_tts_ref.yaml (KL disabled in v1) ├── rollout/qwen3_tts_rollout.yaml (vllm_omni_tts + AR agent loop) └── reward/qwen3_tts_reward.yaml (AsrErrorRateRewardManager + remote ASR) The trainer yaml's defaults now load each role twice: the diffusion / shared base first, then the qwen3_tts/<role> file on top. Hydra merges the second over the first at the same target path. Composition tests (7/7) pass against the canonical layout. README and docs paths updated to match. verl-project#2 — AC-6 metrics in validation metrics.json (T16). The worker's _emit_validation_artifacts now writes a stable schema with the full AC-6 scalar set (mean_reward, mean_cer, mean_duration_ratio, policy_loss, kl_loss) — mean_reward / mean_cer / mean_duration_ratio are populated from per-sample rewards / CER / waveform durations; policy_loss / kl_loss are null at validation steps (no parameter update happens during validation). Test: tests/agent_loop/test_ar_tts_validation_logging.py now asserts every AC-6 key is present and policy_loss/kl_loss are explicitly null. verl-project#3 — Post-run AC-8 check wired into runtime (T16). main.py's hydra entry now wraps verl.trainer.main_ppo.run_ppo with a try/finally that resolves the validation artifact dir (config.trainer.validation_data_dir or default_local_dir/validation_audio) and calls post_run_check_emitted_artifacts(...) when the dir exists. A run with any validation_step_* directory containing zero generated audio files raises ArtifactWriteError at the end of the run, regardless of whether training itself succeeded. Tests: tests/trainer/test_qwen3_tts_post_run_check.py covers the resolver, the happy path, and the empty-step fail-closed path. verl-project#4 — Fail-closed _check_speaker_encoder (T1 helper). The Hub-id branch that previously returned True when local weights couldn't be found (round-4's fail-open shortcut) now returns False with a clear stderr message instructing the operator to download the checkpoint first. Tests: tests/trainer/test_qwen3_tts_smoke_helper.py covers four cases: no-local-checkpoint, no-speaker-encoder, speaker-encoder-present, unreadable-file. Manual verification: smoke._check_speaker_encoder('.hf_cache/Qwen3-TTS-12Hz-0.6B-Base') → True (real downloaded checkpoint, 76 speaker_encoder keys) smoke._check_speaker_encoder('Qwen/Qwen3-TTS-12Hz-0.6B-Base') → False (Hub id without local download; fail-closed) Tests: 122 recipe-scope tests pass (118 R0-R4 + 4 R5 helper/post-run/AC-6). Live engines (T1 part 2, T2b, T3, T14) — still pending. The structural plumbing is in place: scripts/qwen3_tts_smoke.py drives the full T1+T2b verification against a real engine; the recipe trainer wraps upstream main_ppo with the AC-8 post-run check; the eval pipeline drives AsyncOmni for both checkpoints. Round 6 (or a human operator) executes these against live processes. Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
What does this PR do?
verldependencyChecklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,vllm_omni,rollout,trainer,ci,training_utils,recipe,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,diffusion,omni,tests,docker,like[diffusion, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][diffusion, fsdp] feat: new rollout schedulerTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always